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Abstract 

Temperature control of a rapid thermal processing (RTP) system using a proposed self-constructing adaptive fuzzy inference 
network (SCAFIN) is presented in this paper. First, the physical modeling of a RTP system is done. An integrated model is 
given for the components that make up a RTP system. These components are the lamp power dynamics, ray-tracing model, 
and the wafer thermal dynamic model. The models for the components are integrated in a numerical code to give a computer 
simulation of the complete RTP system The simulation can be used to investigate the interaction of the furnace, lamp 
contour, and the control system. Then a direct inverse control scheme using the proposed SCAFIN is adopted to control the 
temperature of the RTP system. The SCAFIN is inherently a modified TSK-type fuzzy rule-based model possessing neural 
network's learning ability. There are no rules initially in the SCAFIN. They are created and adapted as on-line learning 
proceeds via simultaneous structure and parameter identification. Simulation results show that the control approach is able 
to track a temporally varying temperature trajectory and maintain the uniformity of the spatial temperature distribution of 
the wafer in the RTP system simultaneously. C 1999 Elsevier Science B.V. All rights reserved. 

Keywords: Fuzzy system; Adaptive fuzzy network; Stnicturc/parameter learning; Rapid thermal process; Direct inverse 
control 



1. Introduction 

Rapid thermal processing (RTP) [3, 9, 14] has 
several advantages over traditional thermal process- 
ing techniques which, in the semiconductor industry, 
means batch horizontal and vertical hot-wall fur- 
naces. One advantage of RTP is that it eliminates 
the long ramp-up and ramp-down time associated 
with furnaces, enabling a significant reduction in the 
thermal budget. Another advantage of RTP is that 
it allows better control over the processing environ- 
ment (e.g., the amount of oxygen present), which is 
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becoming critical in some applications. RTP is also 
a single-wafer process, which is desirable in steps 
such as gate-stack formation done in a cluster tool 
arrangement Today, RTP is in production use for 
source/drain implant annealing (dopant activation), 
contact alloying, formation of refractory nitrides and 
suicides, glass (BPSO) refiow. 

The thermal cycles used in furnaces and RTP sys- 
tems obviously depend on the application, but always 
involve ramping up to a set temrxrature, holding the 
wafer at that temperature for a set time, and ramping 
back down. Some steps are more complex, involving 
two or three different temperature set points. 
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The speed of the ramp Tate depends on the design 
of the system, but RTP systems that use tungsten 
halogen lamps to heat the wafer can produce rates on 
the order of 50-75°C/s (most vertical furnaces have 
ramp rates ^5°C/s). The main advantage of this is 
that it allows a reduction in the thermal budgets, which 
is defined as the total amount of time that a wafer 
can be held at high temperature during the fabrication 
process. Fast ramp rates also help keep the throughput 
of RTP systems competitive with that of large batch 
furnaces, which can process 200 or more wafers per 
tube, especially for steps where the process time- 
at-temperature is short However, rapid ramp rates 
also present some new challenges. One of the most 
common is that the wafer's edge heats and cools at 
a different rate than the rest of the wafer. This can lead 
to temperature nonuniformities across the wafer and 
a problem called slip. Temperature measurement and 
control are also more difficult Most of today's RTF 
development efforts center around temperature mea- 
surement and control Wafer temperature nonunifor- 
mities and temperature control remain indeed critical 
issues for RTP. Numerous recent papers have dealt 
with the two faces of this problem [2, 12, 14], Temper- 
ature control involves maintaining spatial uniformity 
across the wafer and tracking temporally varying tem- 
perature profiles. Since the RTP system is complex 
and the operating point may change with time, it is 
difficult to meet the temperature control requirement 
by traditional control methods. We shall show in Sec- 
tion 4 mat undesirable control results are observed by 
using traditional control methods on the RTP system. 

Recently, the advent of fuzzy logic controllers 
(FLC) and neural controllers based on multilayered 
backpropagatjon neural networks (BPNNs) has 
inspired new resources for the possible realization of 
better and more efficient control [5, 10]. They offer 
a key advantage over traditional adaptive control sys- 
tems; they do not require mathematical models of the 
plants. The concept of fuzzy logic has been applied 
successfully to the control of industrial processes 
[5]. Conventionally, the selection of fuzzy if-then 
rules often relies on a substantial amount of heuristic 
observation to express proper strategy's knowledge. 
Obviously, it is difficult for human experts to exam- 
ine all the input- output data from a complex system 
to find a number of proper rules for the FLC. For 
a BPNN, its nonlinear mapping and self-learning 



abilities have been the motivating factors for its 
use in developing intelligent control systems [10]. 
However, slow convergence is the major disadvantage 
of the BPNN. Moreover, when it is trained on-line in 
order to well adapt to the environment variations, its 
global tuning property usually leads to the OYer-tuned 
phenomenon, which will degrade the performance of 
the controller [18]. In [18], a new on-line training 
scheme is proposed, but it requires additional training 
of the adjacent patterns at each sampling time, which 
will increase the computational load. In this paper, 
a self-constructing adaptive fuzzy inference network 
(SCAFIN) is proposed to overcome the disadvantages 
of the BPNN and FLC. For the SCAFIN, due to its 
local tuning 'property, the over-tuned phenomenon of 
BPNN can be overcome. 

The SCAFIN is a fuzzy rule-based network pos- 
sessing learning ability. Compared with other existing 
neural fuzzy networks [4, 17, 19], a major charac- 
teristic of the network is that no preassignment and 
design of the rules are required. The rules are con- 
structed automatically during the on-line operation. 
Two learning phases, the structure as well as the pa- 
rameter learning phases are adopted on-line for the 
construction task. One important task in the struc- 
ture identification of the SCAFIN is the partition 
of the input space, which influences the number of 
fuzzy rules generated. Traditional partitioned results 
are shown in Figs. 1(a) and (b). Fig. 1(a) is a grid- 
type partitioned result (4, 19]. A major problem of 
such kind of partitioning is that the number of fuzzy 
rules increases exponentially as the dimension of 
the input space increases. Fig. 1(b) is a clustering- 
based partitioned result [8, 11, 19]. Compared with 
the grid-type partition, the number of rules is 
reduced by this method, but not the number of 
membership functions in each dimension. In fact, 
by observing the projected membership functions 
in Fig. 1(b), we can find that some membership 
functions projected from different clusters have high 
overlapping degrees. These highly overlapping mem- 
bership functions can be eliminated. An on-line input 
space partitioning method, the aligned clustering- 
based method, is proposed in this paper. The on-line 
partitioned result is shown in Fig. 1(c). This method 
will reduce hot only the number of rules generated but 
also the number of fuzzy sets in each dimension. An- 
other feature of the SCAFIN is that it can optimally 
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Fig. 1, Fuzzy partitions of a two-dimensional input space: (a) grid-type partitioning; (b) clustering-based partitioning; (c) proposed aligned 
clustering-based partitioning. 



determine the consequent part of fuzzy if-then rules 
during the structure learning phase. A fuzzy rule of 
the following form is adopted in our system initially, 



Rule J: Wxi Is An and • - • and x„ isA in 
THEN yi is m h 



(1) 



where x t and y/ are the input and output variables, 
respectively, Ay is a fuzzy set, and m/ is the 
position of a symmetric membership function of 
the output variable with its width neglected during 
the defuzzification process. Then, by monitoring the 
change of the network output error, additional terms 
(the linear terms used in the consequent part of the 
TSK model [17]) will be included when necessary to 
further reduce the output error. This consequent iden- 
tification process is employed in conjunction with the 
precondition-identification process to reduce both the 
number of rules and the number of consequent terms. 
For the parameter-identification scheme, the conse- 
quent parameters are tuned by recursive least-squares 
(RLS) algorithm, and the precondition parameters 
are tuned by the backpropagation learning algorithm. 
Both the structure and parameter learning are done si- 
multaneously to achieve fast learning. The SCAFIN is 
used to control the temperature of a RTP system in this 
paper to achieve two control objectives: temperature 
trajectory following and temperature uniformity on 
the wafer. 

This paper is organized as follows. In Section 2, 
physical modeling of the RTP system is performed. 
In Section 3, the configuration of the SCAFIN-based 
control and the training process are introduced. In 
Section 4, simulation studies on temperature control 
of the RTP system using SCAFIN are presented. The 
conclusion is made in Section 5. 



2. Modeling of the RTP system 

A physical modeling of the RTP system is worth 
doing for the following reasons: 

• First, the spatial definition of the thermal maps 
achievable by computation is much better than 
what is feasible through multi-point measurement 

• Second, the accuracy of the currently available tem- 
perature sensors is not sufficient for finely optimiz- 
ing thermal uniformity. 

• Third, the relative effects of influencing para- 
meters, together with the impact of new hardware 
arrangements, are easier and cheaper to assess by 
computation. 

• Finally, thermal models of RTP processors allows 
the test and development of temperature controller 
without the need of a RTP processor, leading to de- 
creasing costs and avoiding the temperature sensors 
problems. 

There have been a number of papers [3, 9, 16] 
concerning the analysis or modeling of the wafer tem- 
perature distribution during RTP. However, only the 
heat transform on the wafer is simulated in these pa- 
pers. The importance of the interface (lamp dynamics, 
sampling, analog-to-digital and digital-to-analog con- 
versions) between controlling computer and RTP pro- 
cessor when implementing the software on the actual 
equipment is ignored. The lamps transfer function that 
we propose will take this into account, and a global 
modeling of the RTP system is used for off-line 
simulation. 

The RTP system considered in this paper is shown 
in Fig. 2. In Fig. 2, a bank of tungsten-halogen lamps 
mounted below a diffusely reflecting ceiling consti- 
tutes the heat source. Cooling air is forced over the 
lamps to prevent the quartz sheaths from overheating. 
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Fig. 2. Schematic of the RTP system. 



Two quartz plates separate the lamps from the lower 
half of the oven. The wafer rests on three quartz pins 
above the black water cooled oven floor. The side 
walls of the bottom half of the oven are partially re- 
flective and are at an angle to the vertical. A pyro- 
meter views the bottom surface of the wafer through 
a central hole in the floor. 

Mathematical model of the closed-loop RTP sys- 
tem is described here. The model is called a global 
model because it simulates all the components in the 
RTP system and can thus be used to investigate the 
interplay of the control system with the heat transfer 
to the wafers as well as the thermal dynamics of the 
wafer itself. A simplified schematic of the closed-loop 
system is shown in Fig. 3. The system uses one bank 
of lamps which are arranged in orthogonal directions. 
The lamps are placed outside the reaction chamber's 
quartz windows. A flat reflector is located behind the 
bank of lamps. The system is controlled by a feedback 
control loop, which utilizes the difference between the 
converted temperature T c and the set temperature T t 
to control lamp power. The constituent components of 
the global model is shown in Fig. 4. The components 
include wafer thermal dynamic model (in particular, 
the heat transfer to and from the wafer) and lamp dy- 
namic and ray tracing model for the dynamics of lamp 
power to the wafer. A power supplier used to provide 
the power to lamps is also included. In the following, 
the mathematic model used for each component is de- 
scribed separately. These models are then integrated 
into a global model. 

The approach employed is analytical/numerical in 
that the heat transfer to, from, and within the wafer 
is calculated. Included in the calculation is the radi- 
ation heat transfer to the wafer, the heat conduction 




Fig. 3. Schematic of the closed-loop RTP system. 




Fig. 4. Global model components of the closed-loop RTP system. 

within the wafer, and the heat convection and the heat 
loss emitted from the wafer surfaces. The present ap- 
plication is to low-pressure chemical vapor deposition 
(LPCVD). This process is typically at a pressure of 
03-5 torr. At the densities associated with this pres- 
sure, convective cooling of the wafer is secondary. 
Thus, convective cooling is not yet important The 
present application is for temperature above 300°C, 
and hence the wafer is opaque to lamp radiation [15]. 
For the radiation, the heat from the lamps is absorbed 
at the wafer surface, and the radiant heat loss occurs 
at the surface. 

As shown in Fig. 3, the controller sends a voltage 
command to the power amplifier after receiving mea- 
sured temperature signal from thermocouple (or py- 
rometer). The power reaches steady voltage level after 
receiving control voltage command in the ideal case. 
But in the actual situation, it is ramp up/down to reach 
the steady-state level. The lamp dynamics describes 
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the power from the lamps after receiving power supply 
voltage. In most published papers, the dynamics of the 
lamp power intensity to the control voltage command 
was neglected, and the power from the lamps was as- 
sumed to be directly proportional to the power supply 
voltage. For this cause, we present a simple dynamic 
model between the command voltage and lamp power. 
The presented lamp dynamics has the following form: 

"('-(■^)). 

JW*=/(K(0,r(*,0)-n0 2 , (2) 

where V(t) is the power supply temporary voltage, 
com* is the present command sent by the controller, 
the last time step, Atmp is the lamp power, and 
the function /( K, T) is varied by V(t) and temperature 
T(x, t ) at position x The presented lamp dynamics will 
raise the complexity of the RTP simulation, and match 
up the overall RTP system to actual RTP dynamics. 

In our RTP system, electrical energy is supplied 
to a "ring" cylindrical arrangement of tungsten- 
halogen bulbs of which more will be mentioned later. 
Energy is radiated through a quartz windows onto 
a thin semiconductor wafer. A model of the heat 
transfer for such a system is developed in cylindri- 
cal coordinates, where the origin of the coordinate 
system is the center of the wafer bottom surface, and 
the z-axis of the coordinate system coincides with the 
central axis of the wafer. The model is based on the 
assumption that the temperature distribution is axi- 
symmetric and that the wafer is thin enough such that 
axial (2 -axis) thermal gradients can be neglected. Fur- 
thermore, the wafer is discretized into annular zones 
in each of which the temperature is assumed to be 
uniform Such an approach is often used in radiative 
heat transfer applications and has been used for RTP 
systems and for furnaces in [12]. 

The heat-transfer model of the wafer takes into 
account convective, conduction, and radiative energy 
transport mechanisms. The model is written as 

t = -A,* T* - A m (T -TJ-A ccod T + BP, (3 ) 

where T x is the ambient temperature expressed as an 
N x 1 vector (the ambient temperature is assumed to 
be constant in the chamber), T is the N x 1 tempera- 
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turc vector of the wafer elements, and P is the M x 1 
lamp power vector, where N is the number of the 
wafer segments and M is the number of lamps [3]. 
The matrices An&> ^conv *nd A con t represent the ra- 
diative, convective, and conductive heat transfer, re- 
spectively. A complete description of these matrices 
can be found in [3]. The capacitive effects of the thick 
windows are neglected here, since the associated time 
constant is two-order magnitude larger than that of the 
wafer. Instead, the window heating model is consid- 
ered as a slowly varying disturbance for the purpose 
of system identification and controller design. Physi- 
cal parameters used in the RTP model is the same as 
those used in [2]. 

3. SCAFIN-based adaptive control 

In this section, the structure of the SCAFIN as 
shown in Fig. 5 is introduced. With this five-layered 
network structure of the SCAFIN, we shall de- 
fine the function of each node of the SCAFIN in 
Section 3.1, and the learning algorithm of the SCAFIN 
in Section 3.2. 

3.L Structure of the SCAFIN 

Let t/*) and aS k ^ denote the input and output of 
a node in layer k, respectively. The functions of the 
nodes in each of the five layers of the SCAFIN are 
described as follows. 

Layer 1 : No computation is done in this layer. Each 
node in this layer, which corresponds to one input 
variable, only transmits input values to the next layer 
directly, i.e., 

jn-^mx. (4) 

Layer 2: Each node in this layer corresponds to one 
linguistic label (small, large, etc.) of one of the input 
variables in Layer 1. In other words, the membership 
value which specifies the degree to which an input 
value belongs a fuzzy set is calculated in Layer 2. 
With the use of Gaussian membership function, the 
operations performed in this layer is 

j& = t -<rf-~ (5) 

where my and 077 are, respectively, the center (or 
mean) and the width (or variance) of the Gaussian 
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Fig. 5, Structure of the proposed self-constructing adaptive toy inference network (SCAFEK). 



membership function of the jth term of the ith input 
variable Unlike other clustering-based partition- 
ing methods, where each input variable has the same 
number of fuzzy sets, the number of fuzzy sets of 
each input variable is not necessarily identical in the 
SCAFIR 

Layer 3: A node in this layer represents one fuzzy 
logic rule and performs precondition matching of 
a rule. Here, we use the following AND operation for 
each Layer-3 node, 



(6) 



where n is the number of Layer-2 nodes participating 
in the IF part of the rule. 

Layer 4: This layer is called the consequent layer. 
Two types of nodes are used in this layer, and they 
are denoted as blank and shaded circles in Fig. 5, re- 
spectively. Hie node denoted by a blank circle (blank 
node) is the essential node representing a fuzzy set 
(described by a Gaussian membership function) of 
the output variable. Only the center of each Gaussian 



membership function is delivered to the next layer for 
the LMOM (local mean of maximum) denazification 
operation [6], and the width is used for output clus- 
tering only. Different nodes in Layer 3 may be con- 
nected to a same blank node in Layer 4, meaning that 
the same consequent fuzzy set is specified for different 
rules. The function of the blank node is 



(7) 



where o& = mo,-, the center of a Gaussian membership 
function. As to the shaded node, it is generated only 
when necessary. Each node in Layer 3 has its own 
corresponding shaded node in Layer 4. One of the 
inputs to a shaded node is the output delivered from 
Layer 3, and the other possible inputs (terms) are 
the input variables from Layer 1. The shaded node 
function is 



(8) 
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where the summation is over all the inputs and ay, is the 
corresponding parameter. Combining these two types 
of nodes in Layer 5, we obtain the whole function 
performed by this layer for each rule as 




Layer 5: Each node in this layer corresponds to one 
output variable. The node integrates all the actions rec- 
ommended by Layers 3 and 4 and acts as a defuzzirier 
with 



*>-£4«>/£4»>. (io) 

i ' t 

3.2. Learning algorithms for the SCAFIN 

Two types of learning, structure and parameter 
learning, are used concurrently for constructing the 
SCAFIN. The structure learning includes both the 
precondition and consequent structure identification 
of a fuzzy if-then rule. There are no rules (i.e., no 
nodes in the network except the input/output nodes) 
in the SCAFIN initially. They are created dynam- 
ically as learning proceeds upon receiving on-line 
incoming training data by performing the following 
learning processes simultaneously: (A) input/output 
space partitioning, (B) construction of fuzzy rules, 
(C) consequent structure identification, (D) parameter 
identification. In the above, processes A-C belong 
to the structure-learning phase and process D belongs 
to the parameter-learning phase. The details of these 
learning processes are described in the rest of this 
section. 

A. Input/output space partitioning: The way the 
input space is partitioned determines the number of 
rules extracted from training data as well as the num- 
ber of fuzzy sets on the universal of discourse of 
each input variable. For each mcoming pattern x, the 
strength a rule is fired can be interpreted as the de- 
gree the incoming pattern belongs to the corresponding 
cluster. For computational efficiency, we can use the 
firing strength given in Eq. (6) directly as this degree 
measure, 

F*(x) = [J u| 3) = e"™*- (11) 
t 



where F'e[0,l], Di=diag(lf<r n ,Vea*:J/(Hnl 
and nti » (m ni m/2, . . ) T . Using this measure, we 
can obtain the following criterion for the generation 
of a new fuzzy rule. Let x(t) be the newly incoming 
pattern. Find 

/ = arg max F J (x\ (12) 

where c(r) is the number of existing rules at time 
L IfF J ^F(t) f then a new rule is generated, where 
F(t) 6(0,1) is a prespecified threshold that decays 
during the learning process. Once a new rule is gen- 
erated, the initial centers and widths are set as 

«(c(o+i)=^ (13) 

AedH^-J'^Sd/M^) Vlrt (H) 

according to the first-nearest-neighbor heuristic [6], 
where /?>0 decides the overlap degree between two 
clusters. 

After a rule is generated, the next step is to de- 
compose the multidimensional membership function 
formed in Eqs. (13) and (14) to the corresponding 
one-dimensional membership function for each input 
variable. For the Gaussian membership function used 
in the SCAFIN, the task can be easily done as 

J 

where m n and are, respectively, the projected 
center and width of the membership function in 
each input dimension. To reduce the number of 
fuzzy sets of each input variable and to avoid the 
existence of redundant ones, we should check the 
similarities between the newly projected member- 
ship function and the existing ones in each input 
dimension. Since bell-shaped membership func- 
tions are used in the SCAFIN, we use the for- 
mula of the similarity measure, E(A,B), of two 
fuzzy sets, A and B, derived previously (see [7] 
for details), where 0^E(A,B)^1 and the larger 
E{A t B) is, the more similar fuzzy set A is to B. 
Let p(m/,<r/) represent the Gaussian membership 
function with center m t and width 07. The whole 
algorithm for the generation of new fuzzy rules 
as well as fuzzy sets in each input dimension 
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is as follows. Suppose no rules arc existent ini- 
tially. 

IF x is the first incoming pattern THEN do 

PARTI. 

{Generate a new rule, 

with center m \ = x, width D\ = 

diag(l/a in i ti ... t lf<rmu)> 

where c init is a prespecified constant 
After decomposition, we have n one- 
dimensional membership functions, 

with m u —xt and a u = a inUt i = 1 « . 

} 

ELSE for each newly incoming x, do 

{find/ = argmaxi^^o/^QO, 

do nothing 
ELSE 

generate a new fuzzy rule, with 

.diag(l/ln(i^),...,l/ln(F J )). 
After decomposition, we have 

Mnew-l =*/, <T m -l = H> • W(/^ ), I * 1 • ■ • H. 

Do the following fuzzy measure for 
each input variable i: 
{degree(i,t) = max l<J<ki 

where ^ is the number of partitions of the ith 

input variable. 

JFdegree{i,t)^*(t), 

THEN adopt this new membership 

function, and set = fq + 1 , 

ELSE set the projected membership 

function as the closest one.} 

}. 

In the above algorithm, <x(0 is a scalar similarity cri- 
terion which is monotonically decreasing such that 
higher similarity between two fuzzy sets is allowed in 
the initial stage of learning. For the output space par- 
titioning, the same measure in Eq. (12) is used Since 
the criterion for the generation of a new output cluster 
is related to the construction of a rule, we shall de- 
scribe it together with the rule construction process in 
Process B below. 



B. Construction of fuzzy rules: As mentioned in 
learning process A, the generation of a new input clus- 
ter corresponds to the generation of a new fuzzy rule, 
with its precondition part constructed by the learning 
algorithm in Process A. At the same time, we have 
to decide the consequent part of the generated rule. 
Suppose a new input cluster is formed after the presen- 
tation of the current input-output training pair (x, d\ 
then the consequent part is constructed by the follow- 
ing algorithm: 

IF there are no output clusters, 

do {PARTI in Process A, with x replaced by d] 
ELSE 

do{ 

find J = argmaxy^V)- 

connect input cluster c(r + 1) to the existing 

output cluster J, 

ELSE 

generate a new output cluster, 

do the decomposition process in PART 2 of 

Process A, 

connect input cluster c(r -h 1) to the 
newly generated output cluster. 

}• 

The algorithm is based on the fact that the precon- 
ditions of different rules may be mapped to the same 
consequent fuzzy set. Compared to the general fuzzy 
rule-based models with singleton output, where each 
rule has its own individual singleton value [11, 19], 
fewer parameters are needed in the consequent part 
of the SCAFIN, especially for the case with a large 
number of rules. 

C Consequent structure identification: Up to 
now, the SCAFIN contains fuzzy rules in the form of 
Eq. (1). Even though such a basic SCAFIN can be 
used directly for system modeling, a large number of 
rules are necessary for modeling sophisticated systems 
under a tolerable modeling accuracy. To cope with this 
problem, we adopt the spirit of the TSK model [17] 
into the SCAFIN. In the TSK model, each consequent 
part is represented by a linear equation of the input 
variables. It 4s reported in [17] that the TSK model 
can model a sophisticated system using a few rules. 
Even so, if the number of input and output variables 
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is large, the consequent parts used in the output are 
quite considerable, some of which may be superflu- 
ous. To cope with the dilemma between the number 
of rules and the number of consequent terms, instead 
of using the linear equation of all the input variables 
(terms) in each rule, we add these additional terms 
only to some rules when necessary. The idea is based 
on the fact that for different input clusters, the corre- 
sponding output mapping may be simple or complex. 
For simple mapping, a rule with a singleton output 
is enough. While for complex mapping, a rule with 
a linear equation in the consequent part is needed The 
criterion to deciding which type of consequent part 
should be used for each rule is based on computing 

„0> 



vector, and P is the covariance matrix. The initial 
parameter vector a(0) is determined in the structure 
learning phase and P(fi) = <rl, where a is a large pos- 
itive constant As to the free parameters my and ay 
of the input membership functions in layer 2, they are 
updated by the backpropagation algorithm. Using the 
chain rule, we have 

(20 ) 



where 
dE 



"W-E^pwo-Ao)". (•« f-Ko-Aa Jj,.£p on 



where a, 3) is the firing strength of rule i, c is the 
number of rules, y d (f) is the desired output, y(t) is 
the current output, and RE(i ) is the accumulated error 
caused by rule i. By monitoring the error curve, if the 
error does not diminish over a period of time and the 
error is still too large, we shall add linear combinations 
of input variables to the rules whose RE(i) values are 
larger than a predefined threshold value. 

D. Parameter identification: The parameter identi- 
fication process is done concurrently with the structure 
identification process. The idea of backpropagation 
is used for this supervised learning. Considering the 
single-output case for clarity, our goal is to minimize 
the error function 

where y\t) is the desired output, and y(t) is the cur- 
rent output The parameters, ap, in layer 4 are tuned 
by RLS algorithm as 

a(t + 1) = o(0 + Pit + l)«(f + l)(/(0 - At)), 

(18) 

pu + n-i [pit) - m^Ltm±jm 

(19) 

where 0<^s£l is the forgetting factor, u is the cur- 
rent input vector, a is the corresponding parameter 



(3) 2(x ( — 7H;y) 



3af 



if term node j is connected (22) 



Mr ! to rule node k. 
{ 0 otherwise. 

Similarly, we have 

4 2) ('+l) = 4 2) <0-ii 



where 



da™ _ 



if term node j is connected 
to node k, 
k 0 otherwise. 



(23) 



(24) 



5.5. Direct inverse control scheme 

The direct inverse control configuration shown in 
Fig. 6 is adopted. Two training phases, off-line and 
on-line training, are used for the design of the con- 
troller. For the off-line training, the general inverse- 
modeling learning scheme [13] is used. A sequence 
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Fig. 6. Block diagram of the on-line training scheme. 



of random input signals u^{k) under the magnitude 
limits of the plant input is injected directly to the 
plant, and then an open-loop input-output charac- 
teristic of the plant is obtained. According to the in- 
put-output characteristic of the plant, proper training 
patterns are selected to cover the entire reference 
output space. Using the collected training patterns 
with the values of the selected input variables as the 
input pattern and the corresponding control signal 
as the target pattern, the network can be updated 
supervisedly to minimize an error function E defined 
by£= Y* r \ - *(*)] 2 . where A, is the num- 

ber of training patterns, and u(k) is the actual output 
of the training network. 

For the on-line training, a conventional on-line 
training scheme is used. Fig. 6 is a block diagram for 
the conventional on-line training scheme. In execut- 
ing this scheme, we follow two phases, control phase 
and training phase. In the control phase, the switch S 1 
and S2 are connected to node 1 and node 2, respec- 
tively, to form a control loop. In this loop, the control 
signal tt(Jb) is generated according to the input vector 
/'<*) = Dte(*+D, »(*), . . . , y P (k - m+ 1 ), u(k - 1), 
...,u(Jfc — n)] T , where u denotes the input, y p is the 
output, and y^ is the reference output In the train- 
ing phase, the switch SI and S2 are connected to 
node 3 and node 4, respectively, to form a training 
loop. In this loop, we can define a training pat- 
tern with input vector /(Jt)=[y p (* + I ),#,(*),..., 
y P (k -m + !),«(* - !),...,«(* - *)] T and desired 
output u\k) t where the input vector of network con- 
troller is the same as that used in the off-line training 
scheme. With this training pattern, the network con- 
troller can be trained supervisedly at each time step k 
to minimize the error function E{k +1) defined by 



E(k + 1) = \[u\k) - 5(*)] 2 , where u(k) is the actual 
output of the network controller when it receives the 
input vector I(k) in the training phase. 



4. Simulation studies 

4.1. RTP temperature control 

Fig. 7 shows a classical desired temperature profile 
for thermal process. The wide operating interval of 
300-1 800°C typically implies variable dynamics. The 
control objective is to control the temperature of the 
RTP system to follow the trajectory in Fig. 7. 

Consider first that the lamp is located at the in- 
ner ring of the wafer only. To design the multi- 
input-single-output (MISO) SCAFIN controller, 
bom off-line and on-line training are adopted. In im- 
plementing the off-line training scheme, a sequence 
of random input signals u^{k) limited between 
0 and 1000 is injected directly to the simulated system 
described in Eq. (3). From the input-output charac- 
teristic of the simulated system (Fig. 8), 150 training 
patterns are selected to cover the entire reference 
output space. The input vector of the network con- 
troller is I(k) = [y p (k + I ), y p (k)] r . We first tried the 
SCAFIN without linear terms added to the consequent 
part Simulation results showed that such network 
could not learn the input-output relationship well, 
even that a large number of rules were used. The 
reason is that for similar inputs the desired outputs 
change sharply (Fig. 8) in this learning task. To han- 
dle this mapping, some additional terms selected via 
learning process C in Section 3.2 are added to the 
consequent part of the generated rules. The learning 
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Fig. 7. Typical desired temperature profile. 
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Fig. 8. Collected training data for inverse control, where the upper plot is y(k) and the lower one is 



parameters in the SCAFIN are set as rj = 0.005, 
0 = 0.5, Fn= 0.005, ^ = 0.7, and Jl=0.96. After 
20 epochs of off-line training, the controlled result is 
shown in Fig. 9. Fig. 10 illustrates the distribution 
of the training patterns and the final assignment of 
fuzzy rules in \y(k\y(k + 1)] plain. Eight rules are 
generated, with additional linear terms generated on 
four of them during learning, and the number of fuzzy 



sets on >(Ar) and y(k + 1) are 5 and 5 (Fig. 11), re- 
spectively. To obtain a better result, on-line learning 
is also performed. After 5 epoches of on-line training, 
the controlled result is shown in Fig. 12. A better 
result is achieved. 

To give a more clear understanding of the per- 
formance, comparisons with other controllers are 
made. They include the backpropagation (BP) neural 
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Performance of SCAFIN controller for Inverse control 




time (eoo) 

The difference between reference end measured temperature 
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Fig. 9. Performance of SCAFIN controller for inverse control; solid: center measurement, dashdot: marginal measurement 




Fig. 10. Final learned fuzzy rules of SCAFIN inverse controller. 



network based direct inverse control [13], model refer- 
ence adaptive control (MRAC) [1], and proportional- 
derivative (PD) control For the MRAC, the plant is 
identified with an AutoRegressive exogenous (ARX) 
model, and a state-space approach control based on 
the identified model is employed For the BP con- 
troller after extensive off-line training, the number 



of on-line training epoches performed on it is the 
same as that for the SCAFIN, which is five in total 
The controlled results using PD, MRAC, and BP are 
shown in Figs. 13-15, respectively. Detailed compar- 
isons, including the nonuniformity (n.u,f ), maximum 
positive error, maximum negative error, and mean 
square tracking error, of these controllers are made 
in Table 1. In Table 1, the nonuniformity is defined 
as the mean square error between the wafer center 
and marginal measurements. The mean square track- 
ing error refers to the error between the reference and 
wafer center measurements. 

4.2. Improvement of temperature uniformity 

From the above MISO controlled results, we ob- 
serve that the temperature gradient difference between 
the inner and outer rings of wafer is not small. To ob- 
tain uniform processing across the wafer surface and 
to prevent the creation of slip defects due to the ther- 
mal stress, the temperature must be nearly uniform 
across the wafer at all times. It is known that the 
distribution of energy from the lamp or lamp array of 
an RTP system must be nonuniform over the wafer 
to obtain uniform temperature distribution due to 
the radiative loss by the wafer edge and nonuniform 
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Fig. 11. Final learned membership functions of SCAFIN inverse controller. 



P«rformanc« of on-line SCAFIN controller for Inverse control 




Fig. 12. Improved performance of MISO SCAFIN controller after on-line learning solid: center measurement, dashdot: marginal 
measurement. 



convective cooling. If the distribution of lamp energy 
used during transients is simply a scaled version of 
what provides steady-state temperature uniformity, 
serious temperature nonuniformity will occur during 
transients. For most processes, it is of paramount im- 



portance to avoid temperature gradients during both 
transients and steady states. Wafer edge irradiation 
has therefore to be adjusted versus experimental con- 
ditions. For that purpose, various means have been 
suggested, as for example, modification of reflector 
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Fig. 14. Performance of model reference adaptive controller (MKAC) for tracking control; solid: center measurement, dashdot: marginal 
measurement. 



characteristics, special lamp arrangements, individ- 
ual lamp powering, or mechanical movement of the 
wafer. Thermal gradients may also be reduced by 
using either guard rings or suspector, i.e., by virtu- 
ally extending the wafer edge; mis technique will be 



applied in our RTP system later. Finally, instead of 
using guard ring, we will present how to use the multi- 
input-multi-ourput (MIMO) SCAFIN to control the 
lamps* power individually to improve temperature 
uniformity. 
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Table 1 

Summary table of performance index 





MIMO 
SONFIN 


MISO 
SONFIN 


MISO BP 


MRAC 


PD control 


Nonurufbrmity (nuf) 


0.0133 


449.4375 


4462482 


350.4692 


4362865 


nuf at guard ring covered wafer 




1365641 


135.7573 


865752 


745269 


Maximum tracking error 


5.1086 


6.0851 


105223 


372696 


4.69E-04 


Minimum tracking error 


-9.663 


-16.2234 


-18.4636 


-352887 


-37.8156 


Tracking MSE 


1Z6401 


82991 


46.1927 


183.1709 


521.6536 



To avoid the edge heat loss, the general approach 
is adding the guard ring on wafer border to lengthen 
wafer radius. The edge loss radiation energy of the 
wafer will be reflected by the guard ring and the dif- 
ference between the center and edge energy will be re- 
duced However, this method avoids the temperature 
gradient limitedly and is not suitable for improving 
the nonunifonnity caused by the lamp radiation. The 
effect of using guard ring is simulated here. In Fig. 16, 
we show the improvement of temperature uniformity 
under the MISO SCAFIN inverse control for the wafer 
covered with guard ring. Such improvement statuses 
for all types of controllers discussed above are listed 
in Table 1. 



For temperature uniformity and tracking, it is diffi- 
cult for a single-output controller to reach these two 
claims simultaneously. The motivation of uniformity 
improvement is to add a circular bank of lamps over 
the wafer border to compensate the edge loss effect 
of the wafer. The lamps emphasize the incident ra- 
diation energy on the wafer edge and also adds the 
energy at the center. Two well-balanced lamp power 
sources usually cannot achieve the desired temper- 
ature and temperature uniformity simultaneously. 
Hence, a MIMO controller is required to control the 
power of different lamps individually, la the conven- 
tional control, it is difficult to overcome the MIMO 
control problem, especially for the nonlinear plant 
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Fig. 16. ComparisoQ of temperature uniformity of the wafer with and without guard ring solid: center measurement, dashdot outer 
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Fig. 17. Improved performance of MIMO SCAFIN controller after on-line learning; dotted: center measurement, dashdot: marginal 
measurement. 



control. Nevertheless, the SCAFIN supports the fit- 
ness of MIMO control. 

We add a circular bank of lamps oyer the edge of the 
wafer. The power range of the edge lamps is the same 



as that of the central lamps. To generate the training 
data, 150 random signals [«cW.«oW] ("c(*) is the 
inner lamp power and i/o(*) is the outer lamp power) 
are injected directly to the plant described in Eq. (3). 
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The 150 generated training patterns are used to train 
the SCAFIN, with [»[£], »[* + ily 0 [k] f y 0 [k + 1]] 
(yi is the temperature at center ring and y Q at outer 
ring) being the inputs and [u^*), u Q {k)] the desired 
outputs. Eight rules are generated after 22 epochs of 
off-line learning. Five and six linear terms are added to 
output Uo and Up, respectively. After 14 epochs of on- 
line training, the controlled result is shown in Fig. 17. 
Much better temperature uniformity is achieved. 
Detailed temperature uniformity comparisons are 
listed in Table 1. 

5. Conclusion 

An adaptive fuzzy inference network, SCAFIN, 
with on-line self-constructing capability is proposed 
in this paper. The SCAFIN is a general connectionist 
model of a fuzzy logic system, which can find its 
optimal structure and parameters automatically. Both 
the structure and parameter identification schemes 
are done simultaneously during on-line learning, so 
the SCAFIN can be used for normal operation at any 
time as learning proceeds without any assignment of 
fuzzy rules in advance. Simulation results in tem- 
perature control of the RTP system has verified its 
effectiveness. 
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